Faster Compact On-Line Lempel-Ziv Factorization

نویسندگان

  • Jun-ichi Yamamoto
  • Tomohiro I
  • Hideo Bannai
  • Shunsuke Inenaga
  • Masayuki Takeda
چکیده

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3 N) time (Okanohara & Sadakane 2009) or O(N log2 N) time (Starikovskaya 2012). The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs (Blumer et al. 1985). We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N , computes the Lempel-Ziv factorization of the string on-line, in O ( m ·min { (log logm)(log logN) log log logN , √ logm log logm }) time and O(m logN) bits of space. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Reversed Lempel-Ziv Factorization Online

Kolpakov and Kucherov proposed a variant of the Lempel-Ziv factorization, called the reversed Lempel-Ziv (RLZ) factorization (Theoretical Computer Science, 410(51):5365–5373, 2009). In this paper, we present an on-line algorithm that computes the RLZ factorization of a given string w of length n in O(n log n) time using O(n log σ) bits of space, where σ ≤ n is the alphabet size. Also, we introd...

متن کامل

On Tinhofer's Linear Programming Approach to Isomorphism Testing

On the complexity of master problems Emergence on decreasing sandpile models 14:35 Kosolobov Durand, Romashchenko Faster lightweight Lempel-Ziv parsing Quasiperiodicity and non-computability in tilings On the Complexity of Noncommutative Polynomial Factorization

متن کامل

Lempel-Ziv Factorization May Be Harder Than Computing All Runs

The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(n log σ) time, where n is the length of the input string and σ is the number of distinct letters in it. We prove an Ω(n log σ) lower bound on ...

متن کامل

On the Size of Lempel-Ziv and Lyndon Factorizations

Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describin...

متن کامل

ar X iv : 1 21 1 . 36 42 v 2 [ cs . D S ] 18 J an 2 01 3 Simpler and Faster Lempel

We present a new, simple, and efficient approach for computing the Lempel-Ziv (LZ77) factorization of a string in linear time, based on suffix arrays. Computational experiments on various data sets show that our approach constantly outperforms the fastest previous algorithm LZ OG (Ohlebusch and Gog 2011), and can be up to 2 to 3 times faster in the processing after obtaining the suffix array, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014